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Abstract 

A three-dimensional, parallelized implementation of the electromagnetic rel¬ 
ativistic moment implicit particle-in-cell method in Cartesian geometry [Ij 
is presented. Particular care was taken to keep the C++11 codebase simple, 
concise, and approachable. GMRES is used as a field solver and during the 
Newton-Krylov iteration of the particle pusher. Drifting Maxwellian prob¬ 
lem setups are available while more complex simulations can be implemented 
easily. Several test runs are described and the code’s numerical and compu¬ 
tational performance is examined. Weak scaling on the SuperMUC system 
is discussed and found suitable for large-scale production runs. 
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Computer: Program should work on any system with a modern C++11 Com¬ 
piler (e.g. g++ in GCC 4.7 and later) and MPI, HDF5 implementations 
Operating systems: Linux / Unix 

RAM: Variable, depending on simulation size, ~ 2 KiB per cell, 56 B per 
particle 

Parallelization: Parallelized using the Message Passing Interface, successfully 
tested on SuperMUC with good scaling behavior. 

Build system: Makefiles 
External libraries: 

Eigen3 (header hies, http://eigen.tuxfamily.org, tested with versions 
3.2.1, 3.2.2), 

MPI2 (e.g. OpenMPI, http://open-mpi.org, tested with version 1.8.1), 
HDF5 1.8 (http://hdfgroup.org/HDF5, tested with version 1.8.13) 

Nature of problem: Kinetic simulations of collisionless plasma mostly need 
to resolve the smallest scales in a plasma, limiting the problem domains that 
can be tackled. The Courant-Friedrichs-Lewy condition poses further prob¬ 
lems. Explicit algorithms require large amounts of computational power to 
cope with these restrictions. Implementations of implicit algorithms, on the 
other hand, are very complex. Very few implicit codes are openly available 
and approachable. Fully relativistic, three-dimensional electromagnetic im¬ 
plicit PiC codes in particular are rare in general. 

Solution method: PICPANTHER implements the relativistic moment im¬ 
plicit particle-in-ccll method. The implicit electric held equation is solved 
using the GMRES algorithm with operators represented as sparse matri¬ 
ces. For each particle, the implicit equation of motion is solved via a robust 
Newton-Krylov scheme. Parallelization is achieved using simple domain de¬ 
composition, resulting in good scalability. 

Restrictions: PICPANTHER only allows for Euclidean geometries. Cur¬ 
rently, only periodic boundary conditions are provided. 

Unusual features: PICPANTHER makes use of advanced numerical tech¬ 
niques (GMRES, Newton-Krylov) to implicitly solve relativistic versions of 
the movement and held equations of a PiC code. It was designed to be sim¬ 
ple and concise, using advanced C++11 language features. Moreover, it is 
parallelized and exhibits good scaling behavior. 

Running time: Minutes to days, depending on problem size and CPU count 
Lines of code: 1988 

License provisions: CPC non-profit use license agreement 
CPC Library Classification: 19.3: Collisionlcss Plasmas 
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1. Introduction 

Plasma physics and especially plasma astrophysics has undergone several 
changes in the past decades. For a long time fluid phenomena have dom¬ 
inated the research, but there was pressure to change the view of plasmas 
from two sides: Observations from high energy astrophysics demanded for an 
explanation of the acceleration process for very high energy charged particles 
[2] - On the other hand, the significant fraction of non-thermal particles in 
the interstellar and intergalactic medium requires a detailed understanding 
of instabilities caused by these particles. 

Kinetic plasma descriptions, required to understand and model the non- 
thermal components of a plasma, come in a wide variety. A full description 
of kinetic plasmas requires the solution of the Boltzmann equation coupled 
with Maxwell’s equations and a complicated collision operator. Finding an 
analytical or even a numerical solution for this problem set seems impossible 
at the moment. One usually resorts to fully non-collisional plasmas, which 
can be described by the Vlasov-Maxwell system. Still, this simplification 
yields a six-dimensional phase space problem that is too memory intensive 
to be solved for general three-dimensional problems. 

The Particle-in-Cell (PiC) method pH 0J has established itself as a state 
of the art method for solving problems in kinetic plasma physics. It is a 
compromise between direct particle interaction, i.e. molecular dynamics or 
N-body codes, and field only methods (Vlasov codes [5]). The main ad¬ 
vantages of the PiC method are that their memory consumption increases 
linearly with the simulated volume and that the runtime is only of order 
N. They are also very suitable for the use of large multi-processor systems. 
Their main disadvantages, on the other hand, are high noise levels and high 
computational requirements due to the operation on the shortest time and 
length scales. Implicit methods like the one used in our code can alleviate the 
computational burden by allowing for larger timesteps and cell sizes, making 
larger scale simulations a possibility. 

As has been discussed in the literature M, the general outline of PiC 
codes is very simple and can be summarized in four stages: Current assign¬ 
ment, field propagation, force calculation and particle movement. A similar 
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simplicity was aimed for during the development of this code. Particular 
care is taken to keep complexity low by adhering to modern C++ language 
features and paradigms. Despite the numerical techniques being highly ad¬ 
vanced, the code itself is, therefore, easy to understand. The result is PIC- 
PANTHER (Parallel Implicit Concise PiC Allowing Non-THermal Electro¬ 
magnetic Relativity) a scalable, implicit, three-dimensional PiC code imple¬ 
menting the relativistic moment method p.J with GMRES [6] and Newton- 
Krylov [2] solvers for fields and particles respectively. 

2. Definitions 

An extensive derivation of the relativistic moment implicit particle-in-ccll 
method in Cartesian geometry is given in [T] . The quantities from [I] that 
are relevant to this paper are summarized in the following paragraphs with 
their derivations omitted here for brevity. In this section, particle positions 
are denoted by x and particle velocities by i? or u — qu with the gamma 
factor 7 . SI units are used throughout this paper and the code. 

Firstly, a ’’magnetic field rotation tensor”, often encountered in implicit 


particle- 

-in-cell schemes [SIS], needs to be calculated for each individual 

par- 

tide: 

a = ^(i - P'T[B n ] + p ,2 B n ® 

(1) 

where 

P 2 m ’ 

(2) 


T=^E n -v + 1 , 
c 2. 

(3) 


p = t 

r’ 

(4) 


D = r(l + /3 ,2 B ■ B), 

(5) 


with the particle charge q and mass m, the simulation timestep At, and 
T[B n ] being the skew-symmetric matrix representing the cross-product of 
B n with an arbitrary vector. This tensor is a rotation transformation due to 
the magnetic field. In a non-relativistic implementation, a does not need to 
be calculated for each particle individually. It can be computed directly on 
the grid instead, with it being defined per species due to the gamma factor 
being unity. 
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The source terms required for the solution of Maxwell’s equations are 
the charge density p, the current density j, the pressure tensor II, and the 
dielectric tensor p: 


p(£g) 

= W(x-x g )-J2^ 

q, 

(6) 

j(x e ) 

= W(x-x g )-Y,^ 

q(a-u ), 

(7) 

ri(T g ) 

= IU(f-T g )-^ — 

q(a ■ u) ® (d • u), 

(8) 

g) 

_ | 

= fU(f-T g )-^ — 

Q(qAt) 2 „ 

- a. 

2 £ 0 m 

(9) 


The dimensionless filtering parameter 0 is introduced below. The edge length 
of a single cubic cell is represented by Ax and the vacuum permittivity by 
£q. These quantities are deposited in the standard particle-in-ccll manner, 
summing over all macro-particles weighted with an interpolation function W. 
Usually, W(x) is a product of splines s m (x), s m (y), s m (z) of some order m. 
Choosing a suitable m is necessarily a compromise between high computa¬ 
tional performance (low order) and low numerical noise (high order). 

The terms p, j, H make up the actual charge density p to be used in the 
calculation of the electric field, 

P=P- ( 6 Ai)V ■ (j“Av ■ ft) . (10) 

After several algebraic manipulations, an implicit expression for the electric 
field E n+@ is obtained, 


c0Af ) 2 ( -V 2 U n+e + VV • ( p ■ E 


+ [I-p ).E n+ ° = 


E n - c 2 0Af (poJ~ ■ fl - V x bA - (c0At) 2 V j-, (11) 


with 0 being a filtering parameter, 0 € [0.5,1.0], that controls the propaga¬ 
tion of electromagnetic waves. Choosing 0 = 1.0 almost completely removes 
the electromagnetic mode from the simulation. By contrast, 0 = 0.5 results 
in a second order correct timestepping, the EM mode staying intact. E n+1 
is calculated by linear extrapolation. 


S „+1 = £“ +e + (9 - 1 )E 

e 


( 12 ) 
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The magnetic field is updated according to Faraday’s law. 


/;” 11 — B n — Ai f V x E" +e ) 


( 13 ) 


From the equations of motion, a nonlinear residual equation is derived 
for the update of a single particle, 


r(u n+1 ) = 


» n+1 - u n _q_f g n +e,£ n +i,2s + u n+1 + u n 

At m \ 
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n +1 


+ 7 " 


x B n (x n+l/2 ) 


(14) 

Of note here is the fact that the electromagnetic fields in this equation are 
required at the mid-orbit position of the particle x n+1 / 2 . Consequently, the 
equation needs to be solved iteratively. During and after the solution of 


equation (14), the particle’s location is updated according to 


x 


71+1 


= x n +At 


u 


71+1 


+ U 1 


7' 


71+1 


+ Y 


(15) 


The definitions above are not exactly equal to the ones in jT] due to small 
oversights being corrected. The expression for ft, contains an additional factor 


in equations (11) and (13) 


q. In equation (11), a factor of /io was added. Finally, signs where changed 


Being a more general algorithm, the scheme outlined above can be mod¬ 
ified easily to yield a simpler, non-rclativistic version [10], 


3. Discretization 

First order finite differences are employed to obtain the spatial derivatives 
needed for the vector operators. 


\ _ ++l,j,k +,j,k 

dx J i+l/2,j,k 


Where required, missing values are taken as an average: 

+j+l,k + +j,k + +j,k+l + +j+l,k+l 


(16) 


+j+l/2,k+l/2 — 


(17) 


Two possible distributions of grid quantities were tested. First, the lattice 
arrangement is chosen such that the electric field E along with j and fi are 


6 
















located at cell nodes. The magnetic field B , p, and A are defined at cell 
centers. Second, the quantities are arranged such that the electric and mag¬ 
netic fields constitute a standard Yee-lattice m- This can be realized by 
depositing the components of E and j on cell edges parallel to the component 
direction and the components of B on cell faces perpendicular to the compo¬ 
nent direction. The charge density p is located at cell nodes, the components 
of fl such that their divergence is evaluated at cell edges and p, such that 
/}. • E results in a vector stored like E. Expressing the gradient, divergence, 
and curl at cell nodes and centers is straightforward in both cases, if tedious. 
Since the discretized operators are matrices, two operators can be combined 
easily by a simple matrix multiplication. 

For the code, the Yee arrangement was chosen as default. The resulting 
matrices are simpler and contain fewer elements. Moreover, the dispersion 
relations obtained below are clearer with modes being more pronounced. 
The cell-centered / node-centered scheme is still available. Unfortunately, 
switching schemes requires the replacement of three source code files. 


4. Code 


As described in section [3j the discrete operators are simple matrices. 
Like with most discretizations of partial differential equations, the resulting 
matrices are sparse. By using a modern sparse matrix library, the equations 
can be written naturally in a very compact and natural manner. Here, the 
Eigen library 02 was utilized. However, alternative libraries like Armadillo 
[13| could be substituted effortlessly, provided a suitable sparse matrix solver 


is available. As an example, the C++ code calculating equation (13) can be 
written 


B -= dt * curl_E * E; 

using a sparse matrix curl_E (Yee-scheme, curl_center in other scheme) 
that was prepared at the beginning of the simulation run according to the 
discretization rules outlined above. Combinations of operators are similarly 
just multiplications in the code. 

Currently, only periodic boundary conditions are implemented. Taking 
the simulation volume to be a three-dimensional box of size N x x N y x N z , 
the scalar, vector, and tensor fields become regular vectors of size N x N y N z , 
3N x N y N z , 9N x N y N z , respectively. Accordingly, equation ( fTT| ) is a 3N x N y N z x 
3N x N y N z square sparse matrix operating on E n+1 with the right-hand side 
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also being a vector of size 3N x N y N z . Since this is a straightforward matrix 
equation, any solver capable of solving a sparse, non-symmetric, square ma¬ 
trix system can be employed to solve for the advanced electric held. As in |Tj, 
the generalized minimal residual method (GMRES) [6] was chosen for this 
task. GMRES is implemented as an unsupported module in Eigen3. Switch¬ 
ing to a different solver (e.g. the biconjugate gradient stabilized method 
BiCGSTAB p3j) is possible, although no such alternatives have been ex¬ 
plored yet. 

Solving the nonlinear equation (14) for the time-advanced velocity is 
achieved using a Newton-Krylov method. A regular GMRES algorithm EttS] 
is modified by replacing the matrix-vector-products with scaled numerical dif¬ 
ferences as described in [16] to yield the inner loop of the Newton iteration. 
Moreover, a basic line search method is employed to improve convergence [7]. 

Additionally, the code can operate in parallel with the available pro¬ 
cessors each assigned a box of about equal size. Equation ([TTJ) is iterated 
via a regular Schwarz domain decomposition method ra using ghost cells. 
Data transfer between processors is mediated by the message passing inter¬ 
face (MPI). Parallel output, provided by the HDF5 library, is available for 
particles and grid quantities. 

Performance is an important feature of a particle in cell code. Currently, 
with only little optimization work done, the code performs about an order of 
magnitude fewer particle updates per second when compared to an optimized 
code like ACRONYM [18j . Due to many force interpolations being performed 
during the Newton-Krylov iteration using a TSC form factor, performance 
necessarily suffers when compared to an explicit scheme. In fact, even with 
few particles per cell the computational time is mostly spent in the particle 
solver. This does, of course, also depend on the number of Newton itera¬ 
tions and the tolerance specified for the residual reduction. Taking this fact 
into account, the code performance seems adequate, especially considering 
the benefits of an implicit scheme. Further optimization is certainly always 
desirable. 

Memory requirements are highly dependent on the particle and cell count. 
Each cell requires about 40 64 bit floating point values being stored on the 
grid. Additionally, more than 100 values per cell are needed for the sparse 
matrix operators (Yee, more for the other scheme). This amounts to approx¬ 
imately 2KiB of storage per cell (including ghost cells). A single particle 
consists of seven 64 bit values resulting in 56 B per particle. 



The code is written in pnre C++ while adhering to idioms like RAII (Re¬ 
source Acquisition Is Initialization), avoiding pointers and explicit memory 
management. These principles allow for very compact code when coupled 
with modern C++11 language and standard library features. Taking advan¬ 
tage of a matrix library like Eigen also significantly benefits compactness and 
simplicity. As a consequence, the full codebase is under 2000 lines of code, as 
counted by the CLOC [19] utility (duplicated code for the two discretization 
schemes was ignored). MPI communication, HDF output, and the opera¬ 
tor setup necessarily make up a large part of the code but are relatively 
straightforward. 

Successful test runs on the SuperMUC supercomputing system were per¬ 
formed on up to 512 cores (32 nodes). Weak scaling performance is plotted 
in fig. [l] and [2} The total number of particle updates is a product of the 
cell count, particles per cell, and number of timesteps. For the plots, this 
number is divided by the total wall clock time or total CPU-time respec¬ 
tively. Therefore, it is a performance measure of the complete update cycle. 
With a PiC code’s computing requirements being mostly determined by the 
total number of particles in the simulation volume, this quantity allows a 
direct comparison between different codes. In this case, the number of field 
iterations is fixed and the number of Newton iterations per particle is ap¬ 
proximately constant over the simulation. During the benchmark runs with 
20 particles per cell, the source term accumulation, field calculation, and 
particle update substeps each take up roughly 20%, 15%, 65% of the time, 
respectively. Only small deviations from ideal scaling, extrapolated from 16 
cores (1 node), are visible. 

5. Implementation 

In this section, a description of the steps executed during a production 
run is given. 

5.1. Initialization 

At first, a configuration hie called config in the working directory is 
parsed. Its format will be described in a different section below. Provided 
in the configuration hie are physical parameters like the electron plasma 
frequency, simulation details like the desired number of timesteps and cells 
in each direction, and program settings like the number of MPI processes 
distributed along each direction. Next, the MPI communication is set up 


9 


Particle updates / second 


1 e+08 


1 e+07 


1e+06 - 


100000 


10 


Weak scaling 
Ideal scaling 


i_i_ 

100 

Parallel Tasks 


1000 


Figure 1: Weak scaling on 1, 2, 4, 8, 16, and 32 SuperMUC nodes. 48 x 48 x 32 cells per 
core were arranged along the z-axis with 20 particles per cell. 
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Figure 2: Loss of efficiency in the benchmark runs of fig. [T] The behavior between 32 
and 256 cores is reproducible and seems to be a result of SuperMUC topology and task 
placement on the machine. 
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and the simulation volume is divided among processes. Furthermore, the 
HDF5 hie output. h5 and a plain text hie energy_output .dat is created by 
rank 0. Important parameters like the length of a timestep and the size of a 
cell are calculated and the layout of the sparse matrix operators is prepared. 
This concludes the programmatic setup sequence. 

With the parameters given in the configuration hie, the magnetic and 
electric held configuration and particle distributions are initialized. For sim¬ 
plicity, a simple standard setup for a drifting Maxwellian particle distribution 
in a constant background magnetic held is available. Particles are created 
using data from a pseudo-random number generator with a given seed, al¬ 
lowing for repeatable simulation runs. Standard C++11 facilities are used 
to draw normally distributed velocity components with a given variance for 
each particle. Similarly, a uniform distribution provides the particle location 
within a given cell. 

The physical setup ends with a hrst output of the data. 

5.2. Timestep 

In this section, a single simulation timestep is described in detail. Algo¬ 
rithm [l] roughly outlines the simulation cycle in pseudocode. 

Initially, the source grid quantities for pressure, dielectric tensor, charge 
density and current density are set to zero. For each particle in the simula¬ 
tion volume, the held quantities at its current position are interpolated and 
its contribution to the source terms calculated. This contribution is then 
deposited onto the grid according to the chosen grid scheme. After all the 
terms are summed up, the values stored on ghost cells are communicated 
between neighboring processes. At this point, the quantities p, j, ft, and fi 
are known on the grid. Additionally, a sparse matrix operator is created for 
the dielectric tensor and a helper variable for V • II is stored. An optional 
smoothing step is available for the charge density. This step can improve 
energy conservation and reduce noise in some cases. 

Given the source terms accumulated above, Maxwell’s equations can be 
solved implicitly. Fields from the previous timestep are renamed as they are 
still needed after the update. Using the differential operators prepared during 
setup, a GMRES solver is initialized for the electric field update. Moreover, 
a vector describing the right-hand side is calculated from the source terms. 
Taking the current electric field values as an initial guess, a GMRES step 
is taken. After the GMRES step, the border values of the electric field are 
exchanged among neighboring processes. This refinement and distribution of 
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Algorithm 1 Update cycle 
1: repeat 

2: for all particles do 

3: collect charge_density 

4: collect current_density 

5: collect pressure 

6: collect dielectric 

7: end for 

8: distribute source terms among processes 

9: E_old 4— E 

10: B_old 4— B 

11: prepare operator matrix 

12: prepare right-hand-side vector 

13: for i 4— 0, maxJters do 

14: E 4— GMRES(operator, right-hand-side, guess=E) 

15: distribute E among processes 

16: end for 

17: B B - dt * curl_E * E 

18: for all particles do 

19: while residual > tolerance do 

20: u_new, residual 4— NK(E, B_old, u_new) 

21: end while 

22: x 4— x + dt * (u + u_new) / (g + g_new) 

23: end for 

24: distribute particles among processes 

25: E 4- (E - (1-THETA) * E_old)/THETA 

26: until simulation finished 


> eq. 

> eq. 

> eq. 

> eq. 


Q 

(7) 

( 8 ) 
(9) 


> eq. (11) 

> eq. (11) 


> eq. 

> eq. 

> eq. 

> eq. 


(13) 


0 ) 

0 ) 


( 12 ) 
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the electric field is repeated for a fixed number of iterations since the solution 
seems to converge rapidly. For large timesteps and cell sizes, more iterations 
might be required here. Having thus evaluated E n+e , the magnetic field 
can be updated and synchronized among the processes. Due to the filtering 
parameter 0, a linear extrapolation of E n+1 is performed later (see below). 

Thus, with E n+e and B n+1 known, the particle velocities and positions 
need to be updated. For each particle, a solution for its time-advanced ve¬ 
locity is calculated in a Newton-Krylov procedure, ft uses evaluations of the 
residual to continuously prepare numerical approximations to the Jacobian 
matrix. The Jacobian matrix is used in a GMRES iteration to obtain a new 
approximation for a Newton step. This step is then applied using a simple 
line search algorithm and a new GMRES iteration is performed. After a 
set number of steps, the particle is moved with the velocity minimizing the 
residual. The residual itself is evaluated by interpolating the forces acting at 
the particle’s location and solving equation (14). 

Unfortunately, Newton-Krylov procedures are not guaranteed to con¬ 
verge. Similarly, catastrophic failures of the solver cannot be ruled out in 
a production run with billions or even trillions of particle updates. Con¬ 
sequently, velocities returned by the solver are checked for validity. If an 
invalid velocity (NaN) is encountered or the Newton iteration fails to con¬ 
verge, the particle is moved using its previous value instead and a warning is 
printed. A few isolated corrections should not greatly influence a sufficiently 
large simulation. Extensive testing revealed that the Newton-Krylov algo¬ 
rithm is sensitive to floating point round-off issues. For this reason, the code 
performs better on processors implementing a fused multiply-add instruc¬ 
tion (e.g. FMA4 on AMD Bulldozer 2011 and later, FMA3 on Intel Haswell 
2013 and later), assuming it is supported by the compiler. While measures 
where taken to mitigate these issues, simulations on processors listed above 
show faster convergence and fewer catastrophic iteration failures (see also 
Since fewer iterations are needed in that case, performance also in¬ 
creases. Taking the above into account, the particle solver should always be 
monitored and its parameters adjusted if necessary. 

Concluding the physical part of a timestep, the electric field is extrapo¬ 
lated to yield the proper values needed for the next iteration as per equation 

©• 

Diagnostic output of the total energy in a simulation is written to the 
text file energy_output.dat after each step by rank 0. Field and particle 
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output is optional after each step. During field output, rank 0 first creates a 
skeleton group structure in output.h5. Then, all processes write their held 
data collectively. For particle output, a new folder is created. Each processor 
writes its particle data to a separate hie in the folder using the packet table 
interface. 

6. Usage 

6.1. Compilation and Execution 

A self-explanatory makefile is provided. Ideally, the HDF5 environment is 
set up properly, so that the h5c++ command wraps mpiCC (or equivalent). 
That way, all the necessary libraries and header hies for MPI and HDF 
are taken care of automatically. Eigen3 headers are assumed to reside in 

/usr/include/eigen3. 

The resulting binary is executed via mpiexec (or equivalent). The number 
of processes needs to be specihed via the standard methods of the MPI im¬ 
plementation (e.g. OpenMPI: mpiexec -np 4 . . ./imp for four processes). 

6.2. Configuration 

Configuration data for the code is read from a hie conf ig in the working 
directory. Its format is one item key=value per line without any other white 
space. The key is a character string and the value is a double precision 
boating point value. Available configuration options are listed in table [l] and 
an example hie is provided with the code. 

6.3. Creating a new setup 

Creating an initialization procedure sufficiently powerful and general to 
cover most use cases using only a configuration hie is an awkward proce¬ 
dure. Additional parsing procedures, rules and exceptions can quickly lead 
to ballooning complexity and in many cases, the code needs to be edited any¬ 
way. Consequently, the present code only provides a setup using up to two 
homogeneous drifting Maxwellian distributions. If a more complex setup is 
needed, it is simple to create a new configuration by editing the source code 
directly. Since the existing hies are commented, they should be used as a 
reference. 

The functions init_. . . are executed at the start of a simulation in the 
order parameters, sources, fields, and particles. Cell size dx, Particle: :dx 
(static variable) timestep length dt, Particle: :dt (static variable) need to 
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Key 

Unit 

Description 

seed 

- 

Seed for the random number generator 

total_steps 

- 

Number of timesteps to be executed 

N_x, (y, z) 

- 

Number of cells in x, y, z direction 

procs_x (y, z) 

- 

Number of processors distributed along the 
x, y, z axes 

plasma_freq 

rad/s 

Electron plasma frequency 

mp_over_me 

- 

Proton mass divided by electron mass 

width_bg 

c 

Width of a velocity component in thermal 
distribution (background) 

width_jet 

c 

Width of a velocity component in thermal 
distribution (jet) 

num_*_bg (e, P. p) 

- 

Number of macro- e~ , p, e + per cell (back¬ 
ground) 

num_*_jet (e, P. p) 

- 

Number of macro- e~ , p, e + per cell (jet) 

v_*_bg (x, y, z) 

c 

Particle drift velocity in x, y, z direction 
(background) 

v_*_jet (x, y, z) 

c 

Particle drift velocity in x, y, z direction (jet) 

B0_x (y, z) 

T 

Initial magnetic held in x, y, z direction 

rescale_dx 

- 

Cell size dx = Ad/v^ will be divided by this 
factor 

rescale_dt 

- 

Time step dt = dx /(\/3c) will be divided by 
this factor 

theta 

- 

Filtering parameter 0 G [0.5,1.0] 

out_p 

- 

Particle data is written every ... steps 

out_q 

- 

p is written every ... steps 

out_j_x (y, z) 

- 

J x / y /z is written every ... steps 

out_E_x (y, z) 

- 

E x / y / z is written every ... steps 

out_B_x (y, z) 

- 

-Bx/y/z is written every ... steps 

smooth_charge 

- 

A binomial filter will be applied to p if this 
is >= 1 


Table 1: Configuration options. 
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be set in one of these functions. Moreover, the particle data (charge q, mass 
m, /3/dt, number density n) contained in the Particle: :p array should be 
defined properly for electrons (index 0), protons (index 1), and positrons 
(index 2). 

By looping from 0 to Nx, 0 to Ny, 0 to Nz, every cell in the simulation 
volume can be indexed. For access to vector held components (e.q. E x ), 
the helper function vindg is used with indices x,y,z and the component c 
(0=x,l=y,2=z). Scalar fields (if needed) are accessed with sindg and indices 
x,y,z. Both helper functions map the three cell indices to a single integer 
index, taking into account ghost cells automatically. 

Particles are simply added to the parts vector. The Tag union is pro¬ 
vided to distinguish particles along with their properties. It is important to 
set the flav held to one of the FLAV0R_. . . values since this determines the 
properties of the particle. A held population is provided in addition to two 
integers id_in_cell, start_cell_index allowing for unique particle IDs, if 
needed. Such an ID can be created by calculating the unique global index of 
the current cell and storing that value in start_cell_index. Different parti¬ 
cles in the same cell receive incrementing values in id_in_cell. Population 
identifiers (P0P_. . .) are optional and serve to differentiate populations of 
the same particle type (e.g. jet or background). 

New variables needed during initialization may be put into the Simulation 
class in header simulation.h. If other particle types or populations are 
needed as well, new FLAV0R_. . . and / or P0P_. . . entries should be added 
in hie particle.h. The FLAV0R_. . . constants serve as indices into the 
properties array. Therefore, the property array needs to be large enough to 
contain the particle properties. In particle. cpp, the Particle: :p[] ini¬ 
tialization needs to be extended accordingly. As mentioned, the hnal particle 
properties need to be set up in one of the init functions. 

Parameters read from the configuration hie are accessible through the 
parameters hash table and indexed using the key used in the hie. 

6.J h Processing output 

In output.h5 held data for the simulation is stored. For each output 
timestep, a new group is created, e.g. Timestep_0. In this group, new groups 
for each held vector are created, e.g. E. Finally, datasets for its components 
are created in this group, e.g. EO. The full path for E x at timestep 0 would be 
/Timestep_0/E/E0, accordingly. The program code includes a short Python 
script that creates a dispersion plot from an output hie. 
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Similarly, particles at timestep 0 are stored in a folder called particles_0. 
Each CPU creates its own file named cpu_0, for example. The hies contain 
a standard HDF5 packet table consisting of the complete particle data (f, u, 
ID). By parsing the ID held, particle details like the type can be extracted. 
A Python example script is provided for the creation of a histogram from 
particle output. 

Convenient Python modules for the handling of the output datasets are 
NumPy [21], Scip y [22], PyTables [23] and matplotlib [2T]. Many utilities 
can handle HDF5 natively so Python need not be used for post-processing. 
Visit, for example, can plot the held output without further preparation [25 ] . 

7. Simulation setup 

The capabilities of the PiC code presented here shall be highlighted with 
a few examples. 

7.1. Wave dispersion 

One important aspect of PiC codes is the interaction of particles and 
holds. A simple way to test whether these interactions are rehected properly 
is the generation and propagation of plasma waves. Although they are relying 
on the complex interplay of particles and waves, they are well understood in 
terms of theory. It is therefore possible to compare the waves’ properties to 
known expressions [25] . 

A thermal magnetized plasma generated randomly will always contain 
several wave modes with dispersion relations that can be evaluated analyti¬ 
cally. This simple setup is realized by creating a homogeneous distribution 
of electrons and protons with Maxwellian velocity distributions. Depending 
on the wave mode, the runtime of the simulation can be chosen in order to 
resolve the lowest frequencies of interest. Similarly, the number of cells in 
each direction is chosen according to the largest wavelength to be resolved 
along the respective axis. 

The relevant parameters used in these simulations are summarized in 
table [2} Filtering parameters 0 = 0.5 and 0 = 1.0 were tested. In the 
former simulation the charge density was smoothed once using a binomial 
filter. No further smoothing was performed in these simulations. 

To analyze the dispersion relation along an axis, the simulation fields 
are integrated over the perpendicular directions and Fourier-transformed in 
space and time. Thereby, u(kj) plots can be obtained along an axis i for 
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thermal velocity (electrons) 

initial magnetic field 

electron plasma frequency 

Debye length 

mass ratio 

timesteps 

length of timestep 

simulation volume 

cell edge length 


^th,e 

B 0 

^pe 

Ad 

m p /m e 

N t 

At 

N x x N y x N z 

Ax 


0.05c 
(0.5, 0, 0) mT 
2.0 • 10 8 rad/s 
7.5 cm 
10 
4000 
4.1 • 1CT 10 s 
256 x 16 x 16 
26 cm 


Table 2: Parameters of the wave dispersion simulations. 


all quantities represented on the computational grid. For comparison, an 
identical simulation was performed using our existing implicit code m 
in % H a dispersion plot of the y-component of the electric field is shown 
with curves representing the theoretical dispersion relations overlaid. As can 
be seen, the wave modes are reproduced properly, with the electromagnetic 
mode showing a characteristic resonance at high k, owing to the finite grid 
used in PiC codes. A comparison to fig. [4] shows very similar behavior of 
the two codes. Notably, the new field solver leads to a EM-mode resonance 
at slightly lower u. Fig. [5] demonstrates temporal filtering with 0 = 1.0. 
Clearly, the electromagnetic wave is almost completely removed from the 
simulation. Other wave modes are not affected, however, weak harmonics of 
the electron gyrofrequency are visible as noise. 

Energy conservation is not exact in this type of implicit PiC codes [28] 
whereas in energy-conserving PiC codes, the momentum conservation is vi¬ 
olated [29]. Ideally, the total simulation energy decreases slowly since an 
increase might lead to instability. The 0 = 0.5 simulation and SOIDBERG 
both showed a decrease of about 1%. 0 = 1.0 led to an energy loss of about 
0.4%. 

7.2. Filamentation instability 

Instabilities are another common phenomenon sensitive to particle-wave 
interactions. The filamentation instability in particular is one of the earlier 
applications of PiC codes. Above all, the generation of strong magnetic fields 
is a good indicator of whether the code performs as expected [IB]. 

For demonstration purposes, we considered a fully three-dimensional and 
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Figure 3: E y (k x ,u)) dispersion plot for new implicit code. For this simulation, the charge 
density was smoothed spatially. 0 = 0.5. 
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14 


10 ° 



Figure 4: E y (k x ,u}) dispersion plot for SOIDBERG. 


relativistic filamentation instability. The parameters for the two counter¬ 
streaming particle distributions are given in table [3j 

Instabilities generally proceed in an exponential fashion. In this case, the 
magnetic field perpendicular to the streaming direction should increase ex¬ 
ponentially |30j . From the energy output of the simulation, fig. [6j it is clear 
that an instability develops. Volumetric plots of the perpendicular magnetic 
held sJB'i + Bl (fig. a show the rapid evolution of spatial structures. Out of 
a random magnetic held created by thermal fluctuations, filaments develop 
parallel to the streaming direction. Over the course of the simulation these 
filaments merge, forming larger structures and effecting significant perpen¬ 
dicular magnetic held amplitudes. After the instability reaches a saturation 
point, these strong holds decay slowly. 

In addition to the the creation of strong helds, particle acceleration is 
expected to occur EH For a histogram of the electron energy distribution 
before and after the simulation see hg. [8j Initially Maxwellian, the particle 
energy forms a high energy tail during filamentation. Most particles’ kinetic 
energy, however, is converted to magnetic held energy. 
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Figure 5: E y (k x ,uj) dispersion plot for new implicit code. No spatial smoothing. 0 = 1.0. 


thermal velocity (electrons) 

stream velocity 

electron plasma frequency 

Debye length 

mass ratio 

timesteps 

length of timestep 

simulation volume 

cell edge length 


t^th,e 

V 

Ad 

m p /m e 

N t 

At 

N x x N y x N z 

Ax 


0.1 c 
±0.995c 
1.0 • 10 5 rad/s 
3.0 • 10 4 cm 
2 

500 

2.0 • 10 -6 s 
128 x 32 x 32 
1.1 • 10 5 cm 


Table 3: Parameters of filamentation simulation. 
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Figure 6: Energy output of the filamentation simulation. 

8. Concluding remarks 

A production-ready and, above all, simple implementation of the rela¬ 
tivistic moment implicit particlc-in-ccll algorithm was presented. Utilizing 
advanced techniques like GMRES and Newton-Krylov, PICPANTHER is 
well-equipped for the simulation of difficult plasma physics problems. More¬ 
over, the concise and straightforward style allows for easy understanding and 
expansion of the code. 

Several test cases were examined and the code’s properties determined. 
The simulations accurately reflect the physical processes, even in extreme 
cases like the filamentation instability. Performance was found to be decent 
and scalability tests on SuperMUC suggest good weak scaling behavior. 
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Figure 7: Time series of volumetric plots of the perpendicular magnetic field y.+ B% 

for the filamentation simulation. T = 5/o;p e , T = 15/u;pe, T = 20/ui pe , T = 2b/uj pe , 
T = 35/o;p e , T = 100/cj pe (from left to right, top to bottom). 
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Figure 8: Initial and final electron energy distribution for the filamentation simulation. 
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